Comprehensive microRNA-seq transcriptomic profiling across 11 organs, 4 ages, and 2 sexes of Fischer 344 rats

Rat is one of the most widely-used models in chemical safety evaluation and biomedical research. However, the knowledge about its microRNA (miRNA) expression patterns across multiple organs and various developmental stages is still limited. Here, we constructed a comprehensive rat miRNA expression BodyMap using a diverse collection of 320 RNA samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats with four biological replicates per group. Following the Illumina TruSeq Small RNA protocol, an average of 5.1 million 50 bp single-end reads was generated per sample, yielding a total of 1.6 billion reads. The quality of the resulting miRNA-seq data was deemed to be high from raw sequences, mapped sequences, and biological reproducibility. Importantly, aliquots of the same RNA samples have previously been used to construct the mRNA BodyMap. The currently presented miRNA-seq dataset along with the existing mRNA-seq dataset from the same RNA samples provides a unique resource for studying the expression characteristics of existing and novel miRNAs, and for integrative analysis of miRNA-mRNA interactions, thereby facilitating better utilization of rats for biomarker discovery.

Overview of study design. 320 total RNA samples were collected from 16 female rats and 16 male rats of the Fisher 344 strain, including four rats for each sex under each of the four developmental stages, and ten organs (adrenal gland, brain, heart, kidney, liver, lung, muscle, spleen, thymus, testis, or uterus) per rat. Aliquots of the same RNA samples have also been used to construct mRNA dataset previously 24,25 (a), which makes it possible to integrate miRNA data with mRNA data. (b) Schematic overview of the miRNA-seq workflow: miRNA libraries were prepared and sequenced for each of the 320 samples. For miRNA quantification, adapter sequences were removed, followed by a read-quality filter. Reads of high quality were then mapped to miRBase, piRBase, GtRNAdb, and the NCBI rat transcriptome and genome. In addition, reads from all 320 samples were pooled for novel miRNA discovery.
Pre-processing and processing of the reads. The adapter sequences were removed with fastp 0.23.2 software (http://opengene.org/fastp/fastp.0.23.2), followed by a quality filter and a length filter. Reads with more than 2 "N" bases were discarded. The clipped reads with length between 16  Filtering of miRNAs and samples. The following strategies were used to filter miRNAs and samples of questionable quality: (1) the low-expressed miRNAs less than one count per library on average, were discarded.
(2) a sample was removed if it failed to cluster with other samples from the same organ type in a hierarchical clustering analysis. Spl_F_104_4 and Brn_M_006_3 samples were removed, as they clustered to uterus samples instead of spleen and brain, respectively. Ultimately, a miRNA expression matrix with 604 miRNAs and 318 samples was obtained for further analysis.
Identification of novel miRNA candidates. First, reads from all 320 samples were pooled for identification of novel miRNA candidates. Using the mapper module provided within miRDeep2.0.1.3, raw reads of the merged samples were subjected to a series of stringent filters (discarding low-quality reads and reads with fewer than 18 nt after clipping the 3′ adapter), and the remaining sequences were then mapped to the rat genome reference (Rn7). Next, the mapped reads were submitted to the miRDeep2 module to detect novel miRNAs with default parameters. Novel miRNAs that have passed the stringent filters (miRDeep2 score ≥10, significant Ranfold P-value and no rfam alert of the possibility of being a rRNA) were selected as the potential novel miRNA candidates.
To validate the organ-specificity of the novel miRNA candidates, we quantified the novel miRNA expressions across 320 samples and performed differential expression analysis between any two organs at each developmental stage. A miRNA is considered 'organ-specific' if it is over-expressed by at least 1.5 fold (fold-change > 1.5 and adjusted P-value < 0.05) in one organ over all other organs and across all four developmental stages. Finally, the 12 organ-specific sequences were reported as novel miRNA candidates in Online-only Table 1.

Data Records
The miRNA-seq dataset generated in study is available in the NCBI Gene Expression Omnibus (GEO) with series accession number GSE172269 36 . This accession contains both the raw sequence data files (fq.gz format) and the processed data files (raw counts of mapped sequencing reads) used in this report. All data can be used without restrictions. www.nature.com/scientificdata www.nature.com/scientificdata/ To independently validate the organ-enriched miRNAs discovered in this study, the literature-reported miRNA profiles of 55 different organs and tissues from normal male rats based on Agilent miRNA microarrays were used 15 . Thirty-one (31) organ-enriched miRNAs identified in our study, including brain-enriched, testis-enriched, liver-enriched, and spleen-enriched and heart-enriched miRNAs, were also profiled in the         www.nature.com/scientificdata www.nature.com/scientificdata/ miRNA microarray dataset and displayed a pattern of high-level expression in biological systems with functions similar to those of the organs examined in this study (Fig. 5b). This result confirmed the reliability of the organ specific signature presented in our dataset.  15 are arranged by biological system. Each row represents a miRNA that is enriched in an organ. Each column represents a sample profiled in the microarray dataset. Abbreviation for organs, Adr, adrenal; Brn, brain; Hrt, heart; Kdn, kidney; Lng, lung; Lvr, liver; Msc, skeletal muscle; Spl, spleen; Thm, thymus; Tst, testis; and Utr, uterus.
www.nature.com/scientificdata www.nature.com/scientificdata/ appeared that there is a more dynamic miRNA expression pattern across the four developmental stages independent of organ types, whereas the temporal expression of mRNA appeared to be more organ dependent than that of miRNA expression. These results highlight the concordance of the two datasets.
Meanwhile, the genomic locus of miRNA genes was used to test the reliability of the association between these miRNA and mRNA datasets. We calculated the correlations between expression levels of miRNAs and those of genes (miRNA-gene pairs). The distributions of correlations between several types of miRNA-gene pairs were demonstrated in Fig. 6b. The correlations between all miRNA-gene pairs were weak (0.01 ± 0.28, median ± sd) as expected. Importantly, the co-transcription events of miRNA and host genes were observed in these datasets, as the expressions of intragenic miRNAs were highly positively correlated with those of their host genes on the same strand (0.60 ± 0.29, N = 179). In addition, the slightly negative correlations (−0.17 ± 0.35) between these miRNAs-host genes (opp) pairs confirmed that co-expression events are rare when intragenic miRNAs and the host genes are located on the opposite strands. In summary, the miRNA dataset along with the mRNA datasets provided a reliable and unique resource for integration.